Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Expression parser / Tokenizer with StringScanner #1844

Merged
merged 33 commits into from
Jan 10, 2025

Conversation

ggmichaelgo
Copy link
Contributor

@ggmichaelgo ggmichaelgo commented Nov 2, 2024

Faster Tokenizer & Expression Parser

Inspired by @tenderlove 's Rails World 2024 Closing Keynote and
@ianks 's #1832 PR ❤️
This PR updates the Expression Parser and Tokenizer to use the new StringScanner to improve the performance.

Faster Expression Parser!

Here are the performance improvements by using Expression2

Expression Type Baseline (iter/s) Optimized (iter/s) Improvement (%)
String 1,642,330.4 1,618,752.0 same-ish
Literal 464,185.8 1,439,419.0 210% faster
Variable 170,784.0 1,130,120.6 562% faster
Number 142,954.2 385,356.9 170% faster
Range 57,236.9 570,767.7 897% faster
All 27,806.2 115,904.0 317% faster

Benchmark Improvement

main branch (baseline)

ruby 3.4.0dev (2024-04-23T16:59:11Z v3.4.0-pshopify-pr.. 70b931820e) +YJIT [arm64-darwin23]
Warming up --------------------------------------
              parse:    14.000 i/100ms
             render:    70.000 i/100ms
     parse & render:    11.000 i/100ms
Calculating -------------------------------------
              parse:    142.958 (± 1.4%) i/s    (7.00 ms/i) -      2.870k in  20.081552s
             render:    704.996 (± 1.7%) i/s    (1.42 ms/i) -     14.140k in  20.064078s
     parse & render:    114.978 (± 0.9%) i/s    (8.70 ms/i) -      2.310k in  20.091266s

fast-expression-parse branch

ruby 3.4.0dev (2024-04-23T16:59:11Z v3.4.0-pshopify-pr.. 70b931820e) +YJIT [arm64-darwin23]
Warming up --------------------------------------
           tokenize:   356.000 i/100ms
              parse:    18.000 i/100ms
             render:    70.000 i/100ms
     parse & render:    14.000 i/100ms
Calculating -------------------------------------
           tokenize:      3.522k (± 1.6%) i/s  (283.89 μs/i) -     70.488k in  20.017506s
              parse:    188.257 (± 0.5%) i/s    (5.31 ms/i) -      3.780k in  20.080262s
             render:    708.969 (± 0.7%) i/s    (1.41 ms/i) -     14.210k in  20.044211s
     parse & render:    142.434 (± 2.1%) i/s    (7.02 ms/i) -      2.856k in  20.065080s

This PR improves the parsing speed by ~24%, and parse & render speed by ~19%.

More Benchmark Results!

Here is another benchmarking with a larger and complex theme:
image

By using the LRU caching, Liquid is creating ~20% less objects, the parsing time is ~18% faster, and render time is ~9% faster.

With production traffic, we are seeing ~14% parsing time improvement:

  Baseline (staging-liquid-2) Experiment (staging-liquid-1) Diff
Average 19.9ms 17.4ms ~14% improvement
p99 82ms 71ms ~15% improvement
p95 52ms 45ms ~16% improvement
p90 42ms 36ms ~17% improvement
p75 28ms 24ms ~17% improvement
p50 17ms 15ms ~13% improvement

@ggmichaelgo ggmichaelgo force-pushed the fast-expression-parse branch 4 times, most recently from 7538bcb to e912bb0 Compare November 5, 2024 19:49
@ggmichaelgo ggmichaelgo marked this pull request as ready for review November 5, 2024 19:51
@ggmichaelgo ggmichaelgo force-pushed the fast-expression-parse branch 3 times, most recently from 10ef505 to 8759798 Compare November 8, 2024 01:41
@ggmichaelgo ggmichaelgo marked this pull request as draft November 13, 2024 15:50
@ggmichaelgo ggmichaelgo force-pushed the fast-expression-parse branch 4 times, most recently from 49e55a6 to e0d54cc Compare November 14, 2024 19:33
@ggmichaelgo ggmichaelgo changed the title Fast expression parse Faster Expression parser / Tokenizer with StringScanner Nov 14, 2024
@ggmichaelgo ggmichaelgo force-pushed the fast-expression-parse branch 2 times, most recently from 7419a44 to c0e45c1 Compare November 14, 2024 21:38
@ggmichaelgo ggmichaelgo marked this pull request as ready for review November 15, 2024 03:49
@ggmichaelgo ggmichaelgo requested review from ianks, a team and Youssef-Mak November 15, 2024 03:49
@ggmichaelgo ggmichaelgo force-pushed the fast-expression-parse branch from 1c8460c to e6786f6 Compare November 15, 2024 21:24

def parse_number(markup)
ss = string_scanner
ss.string = markup

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it might be possible to make this method faster by not using string scanner and manually iterating over the bytes in the string. It looks like markup is a string and we're just setting it on the string scanner.

Just as a rough outline, something like this:

idx = 0
byte = markup.getbyte(idx)

return false if return false if byte != DASH && byte != DOT && (byte < ZERO || byte > NINE)

while (byte = markup.getbyte(idx) && idx += 1) # get the current byte, advance the cursor
  return false if byte != DOT && (byte < ZERO || byte > NINE)

 ...
end

It's kind of a pain to manage the cursor position yourself, but we'll actually create more JIT friendly code this way. I'd recommend using StringScanner only when we need to scan through a large buffer and use complex matches (IOW, regular expressions). We might be able to be even faster with StringScanner in the future, but for now I think managing a cursor yourself will be faster (in this case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!
I agree that StringScanner doesn't fit for this use case 👍

Unfortunately, I didn't get to see any performance improvement as much as I hoped to see... (roughly 2% diff by keeping the cursor in Ruby ☹️)

(10 seconds warmup / 20 seconds run)
Baseline

ruby 3.4.0dev (2024-04-23T16:59:11Z v3.4.0-pshopify-pr.. 70b931820e) +YJIT [arm64-darwin23]
Warming up --------------------------------------
Liquid::Expression2#parse: number
                        39.927k i/100ms
Calculating -------------------------------------
Liquid::Expression2#parse: number
                        405.480k (± 0.4%) i/s    (2.47 μs/i) -      8.145M in  20.087950s

Without StringScanner

ruby 3.4.0dev (2024-04-23T16:59:11Z v3.4.0-pshopify-pr.. 70b931820e) +YJIT [arm64-darwin23]
Warming up --------------------------------------
Liquid::Expression2#parse: number
                        39.051k i/100ms
Calculating -------------------------------------
Liquid::Expression2#parse: number
                        395.787k (± 0.3%) i/s    (2.53 μs/i) -      7.927M in  20.029537s
Source code
def parse_number(markup)
  # the first byte must be a digit, a period, or  a dash
  byte = markup.getbyte(0)
  return false if byte != DASH && byte != DOT && (byte < ZERO || byte > NINE)

  is_integer = true
  last_dot_pos = nil
  num_end_pos = nil
  idx = 1

  while byte = markup.getbyte(idx)
    return false if byte != DOT && (byte < ZERO || byte > NINE)

    idx += 1

    # we found our number and now we are just scanning the rest of the string
    next if num_end_pos

    if byte == DOT
      if is_integer == false
        num_end_pos = idx - 1
      else
        is_integer = false
        last_dot_pos = idx
      end
    end
  end

  num_end_pos = markup.length if idx == markup.length

  return markup.to_i if is_integer

  if num_end_pos
    # number ends with a number "123.123"
    markup.byteslice(0, num_end_pos).to_f
  elsif last_dot_pos
    markup.byteslice(0, last_dot_pos).to_f
  else
    # we should never reach this point
    false
  end
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to profile to find what's slow, but I think avoiding the mutation of StringScanner is a great improvement. Maybe try profiling with Vernier?

If I'm reading this correctly, it looks like this function requires that the whole string must contain 0-9 or a . character. We seem to exit early if return false if byte != DOT && (byte < ZERO || byte > NINE) happens.

If that's true, then we could do a fast path case that's like return markup.to_i if markup.match?(/\A[0-9]+\z/).

Something like this:

def parse_number(markup)
  # the first byte must be a digit, a period, or  a dash
  byte = markup.getbyte(0)
  return false if byte != DASH && byte != DOT && (byte < ZERO || byte > NINE)

  # This handles all integer cases
  return markup.to_i if markup.match?(/\A[0-9]+\z/)

  # rest of method
end

It might be possible to do the float parsing without the loop, but I don't really understand what it's doing. It looks like it's trying to handle a number with multiple dots in it? I don't understand the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's trying to handle a number with multiple dots in it? I don't understand the case.
Yup, since "123.456.789".to_f returns 123.456 in Ruby, we need to replicate that logic in Liquid.

The original Tokenizer actually parses the number with the Regexs:

...
case markup
when INTEGERS_REGEX
  Regexp.last_match(1).to_i
when RANGES_REGEX
  RangeLookup.parse(Regexp.last_match(1), Regexp.last_match(2))
when FLOATS_REGEX
  Regexp.last_match(1).to_f
else
...

with this approach of using Regex, we are scanning the String twice for parsing a number. (Once with INTEGERS_REGEX and another one with FLOATS_REGEX)
By scanning the String once, Tokenizer2 is 1.7x faster than Tokenizer1 for parsing numbers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, since "123.456.789".to_f returns 123.456 in Ruby, we need to replicate that logic in Liquid.

Do we really need to replicate all logic from Ruby? For example "1e10".to_f works in Ruby, but it's not supported here. Also, if it's following the same rules that the lexer uses, I don't think we would encounter a :number token that contains multiple .:

$ ruby -I lib -r liquid/lexer -e'p Liquid::Lexer.new("12.34.56").tokenize'
[[:number, "12.34"], [:dot, "."], [:number, "56"], [:end_of_string]]

Are these strings coming from the lexer or somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to replicate all logic from Ruby?

You are right, we don't always follow the Ruby's number parsing logic... 😅

So... Liquid has 2 parsing modes, lax and strict parsing mode. (lax is our legacy parser)

In strict mode, templates like {{ 123.123.123 }} gets tokenized by Lexer#tokenize first, and it will raise an error

Liquid syntax error: Expected end_of_string but found dot in "{{ 123.123.123 }}"

However, in lax mode, {{ 123.123.123 }} template gets parsed by Expression#parse without using the Lexer.

Since the Expression class' FLOATS_REGEX is /\A(-?\d[\d\.]+)\z/, it allows multiple dots in the float :(
image


class << self
def string_scanner
@string_scanner ||= StringScanner.new("")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a problem creating a new StringScanner object each time we need one? It seems like we're just caching the object and then mutating it later. That could lead to thread safety problems down the road. Usually I like to create one StringScanner object per buffer I'm parsing, which should amortize the cost of the allocation as long as it's reused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I am also worried about thread safety problem in the future :(
In local testing, we see around ~5.6% performance improvement, but in production, we were seeing performance regression due to increased GC calls.

Also, allocating a new StringScanner object was more expensive than I have thought 🤔
For ParseContext#parse_expression`, allocating a new StringScanner took 6.7% of the total time:
image

For thread safety in the future, we could create a pool of StringScanner later 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those, who uses web servers like Puma, recycling the StringScanner would become problematic... I just added StringScannerPool to this PR 👍
Thank you for catching this!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ParseContext#parse_expression`, allocating a new StringScanner took 6.7% of the total time

This is a recursive descent parser though right? Meaning parse_expression will recursively call parse_expression? In other words, does parse_expression get called many times per document?

Ideally we would only create one StringScanner object per document we parse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup - parse_expression is a recursive function. The function always finish using its StringScanner before it calls itself, so we are safe to reuse the StringScanner.
The PR has been updated to create one StringScanner object per document which is stored in the ParseContext 👍

Copy link

@tenderlove tenderlove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use a pool like this. It feels like a memory leak waiting to happen. Race conditions could cause us to create many more StringScanner object than are necessary, and since we have no cap on the @ss_pool we'll retain a bunch of string scanner objects that we don't need.

I really think we need to ensure that only one string scanner object is created per document we parse. I have a hard time believing that allocating one string scanner per document would be a significant source of overhead. Even if it is, it seems like we could refactor the tokenize method to return fewer arrays and make up for any "lost time".

lib/liquid/string_scanner_pool.rb Outdated Show resolved Hide resolved
lib/liquid/string_scanner_pool.rb Outdated Show resolved Hide resolved
@ggmichaelgo ggmichaelgo force-pushed the fast-expression-parse branch 2 times, most recently from a37bdde to 76c5f09 Compare November 18, 2024 03:44
@@ -20,7 +22,7 @@ class Expression
# malicious input as described in https://github.com/Shopify/liquid/issues/1357
RANGES_REGEX = /\A\(\s*(?>(\S+)\s*\.\.)\s*(\S+)\s*\)\z/

def self.parse(markup)
def self.parse(markup, _ss = nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[note] Providing default StringScanner value makes the Expression class compatible with Liquid-C

INTEGER_REGEX = /\A(-?\d+)\z/
FLOAT_REGEX = /\A(-?\d+)\.\d+\z/

CACHE = LruRedux::Cache.new(10_000) # most themes would have less than 2,000 unique expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get 90% of the benefit by hardcoding a few common expressions? I'm not sure how I feel about a global cache like this given the objects aren't frozen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... I could see how it could go wrong... 😓
I have updated the PR to create an Expression cache per document parsing.
The caches do fill up quickly, and the cache hit rate stays relatively high (~60% for Dawn)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: We will have to update Liquid-C to actually provide Expression cache
Shopify/liquid-c#227

lib/liquid/lexer.rb Outdated Show resolved Hide resolved
@@ -176,7 +180,7 @@ def []=(key, value)
# Example:
# products == empty #=> products.empty?
def [](expression)
evaluate(Expression.parse(expression))
evaluate(Expression.parse(expression, @string_scanner))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: I think Context#[] should just be deprecated honestly, it's a very deceptive performance footgun.

Instead, we can add a new method: Context#parse_and_evaluate

@ggmichaelgo ggmichaelgo force-pushed the fast-expression-parse branch 2 times, most recently from 0e4da50 to aa7f926 Compare November 22, 2024 02:03
# constructing new StringScanner in Lexer, Tokenizer, etc is expensive
# This StringScanner will be shared by all of them
@string_scanner = StringScanner.new("")
@expression_cache = LruRedux::Cache.new(10_000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this an option on the parse context

Suggested change
@expression_cache = LruRedux::Cache.new(10_000)
@expression_cache = options[:expression_cache]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also have default expression caching option.

@expression_cache = if options[:expression_cache] != false
  options[:expression_cache] || LruRedux::Cache.new(10_000)
end

With this way, people can re-use their expression caches between templates or disable the expression caching.

@ggmichaelgo ggmichaelgo force-pushed the fast-expression-parse branch from 463231b to d48708a Compare January 7, 2025 18:32
@ggmichaelgo ggmichaelgo force-pushed the fast-expression-parse branch from 7914596 to 97cada6 Compare January 10, 2025 18:24
@ggmichaelgo ggmichaelgo merged commit 323951b into main Jan 10, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants